Go Big or Go Home Part 2 - Working and Visualising on cluster

In this notebook we finally do our larger area. We're going to need some better visualisation tools and it would be great not to bring the results back to the Jupyter notebook but to leverage the dask clusters resources during visualisation. We'll be using some dask-aware visualiation libraries (holoviews and datashader) to do the heavy lifting.

Let's begin by starting up our cluster and sizing it appropriately to our computational task.

Tassie time!¶

All the code here is the same as the conclusion from the previous notebook, except we'll make the cluster bigger with 10 workers instead of 4. We'll also make the masking and NDVI calculation into a python function since we won't be making any changes to that now.

We'll use the same ROI and time period for this run and we're using all the techniques so far to reduce the computation time:

  1. Dask chunk size selection
  2. Only loading the measurements we intend on using in this calculation to save on the task graph optimisation time
In [1]:
# Initialize the Gateway client
from dask.distributed import Client
from dask_gateway import Gateway

number_of_workers = 10 

gateway = Gateway()

clusters = gateway.list_clusters()
if not clusters:
    print('Creating new cluster. Please wait for this to finish.')
    cluster = gateway.new_cluster()
else:
    print(f'An existing cluster was found. Connecting to: {clusters[0].name}')
    cluster=gateway.connect(clusters[0].name)

cluster.scale(number_of_workers)

client = cluster.get_client()
client
An existing cluster was found. Connecting to: easihub.172abe6beae3457fba2c61f3261247d9
Out[1]:

Client

Client-0735f732-0446-11ee-8397-06a3e810ea18

Connection method: Cluster object Cluster type: dask_gateway.GatewayCluster
Dashboard: https://hub.csiro.easi-eo.solutions/services/dask-gateway/clusters/easihub.172abe6beae3457fba2c61f3261247d9/status

Cluster Info

GatewayCluster

  • Name: easihub.172abe6beae3457fba2c61f3261247d9
  • Dashboard: https://hub.csiro.easi-eo.solutions/services/dask-gateway/clusters/easihub.172abe6beae3457fba2c61f3261247d9/status
In [26]:
import pyproj
pyproj.set_use_global_context(True)

import git
import sys, os
from dateutil.parser import parse
from dateutil.relativedelta import relativedelta
from dask.distributed import Client, LocalCluster
import datacube
from datacube.utils import masking
from datacube.utils.aws import configure_s3_access

# EASI defaults
os.environ['USE_PYGEOS'] = '0'
repo = git.Repo('.', search_parent_directories=True).working_tree_dir
if repo not in sys.path: sys.path.append(repo)
from easi_tools import EasiDefaults, notebook_utils
easi = EasiDefaults()
Successfully found configuration for deployment "csiro"
In [3]:
dc = datacube.Datacube()
configure_s3_access(aws_unsigned=False, requester_pays=True, client=client);
In [4]:
# Get the centroid of the coordinates of the default extents
central_lat = sum(easi.latitude)/2
central_lon = sum(easi.longitude)/2
# central_lat = -42.019
# central_lon = 146.615

# Set the buffer to load around the central coordinates
# This is a radial distance for the bbox to actual area so bbox 2x buffer in both dimensions
buffer = 0.8

# Compute the bounding box for the study area
study_area_lat = (central_lat - buffer, central_lat + buffer)
study_area_lon = (central_lon - buffer, central_lon + buffer)

# Data product
products = easi.product('landsat')

# Set the date range to load data over
set_time = easi.time
set_time = (set_time[0], parse(set_time[0]) + relativedelta(years=1))
# set_time = ("2021-01-01", "2021-12-31")

# Selected measurement names (used in this notebook)
nir = easi.nir('landsat')        # If defined, else None
red = easi.red('landsat')        # If defined, else None
if nir is None: nir = 'nir08'    # USGS Landsat
if red is None: red = 'red'      # USGS Landsat

# Set the QA band name and mask values
good_pixel_flags = {                       # USGS Landsat
    'nodata': False,
    'cloud': 'not_high_confidence',
    'cloud_shadow': 'not_high_confidence',
    'water': 'land_or_cloud'
}
qa_band = easi.qa_band('landsat')           # If defined, else None
qa_mask = easi.qa_mask('landsat')           # If defined, else None
if qa_band is None: qa_band = 'qa_pixel'    # USGS Landsat
if qa_mask is None: qa_mask = good_pixel_flags

# Set the measurements/bands to load. `None` will load all of them
measurements = [qa_band, red, nir]

# Set the resampling method for the bands
resampling = {qa_band: "nearest", "*": "average"}

# Set the coordinate reference system and output resolution
set_crs = easi.crs('landsat')  # If defined, else None
set_resolution = easi.resolution('landsat')  # If defined, else None
# set_crs = "epsg:3577"
# set_resolution = (-30, 30)

# Set the scene group_by method
group_by = "solar_day"
In [5]:
def masked_seasonal_ndvi(dataset):
    # Identify pixels that are either "valid", "water" or "snow"
    cloud_free_mask = masking.make_mask(dataset[qa_band], **qa_mask)
    # Apply the mask
    cloud_free = dataset.where(cloud_free_mask)

    # Calculate the components that make up the NDVI calculation
    band_diff = cloud_free[nir] - cloud_free[red]
    band_sum = cloud_free[nir] + cloud_free[red]
    # Calculate NDVI
    ndvi = None
    ndvi = band_diff / band_sum

    return ndvi.groupby("time.season").mean("time")  # Calculate the seasonal mean

dataset = None # clear results from any previous runs
dataset = dc.load(
            product=products,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = {"time":2, "x":3072, "y":3072},
            group_by=group_by,
        )

ndvi_unweighted = masked_seasonal_ndvi(dataset)
In [6]:
print(f"dataset size (GiB) {dataset.nbytes / 2**30:.2f}")
print(f"ndvi_unweighted size (GiB) {ndvi_unweighted.nbytes / 2**30:.2f}")
dataset size (GiB) 7.05
ndvi_unweighted size (GiB) 1.05
In [7]:
client.wait_for_workers(n_workers=10)
In [8]:
%%time
actual_result = ndvi_unweighted.compute()
CPU times: user 1.71 s, sys: 785 ms, total: 2.49 s
Wall time: 44 s

You'll notice the computation time is slightly faster with more workers - we're IO bound so more workers means more available IO bandwidth and threads. It's not 2-3 x faster though - we're wasting a lot of resources because we can't actually use all of that extra power.

Tip: More isn't always better. Be mindful of your computational resource usage and cost. This size cluster is a tremendous waste for this size computational job. Size things appropriately.

And visualise the result

In [9]:
actual_result.sel(season='DJF').plot()
Out[9]:
<matplotlib.collections.QuadMesh at 0x7f484c82bfd0>

We'll save the coordinates of this section from the array (as slices) so we can use them later for visualising the same ROI from a larger dataset.

In [10]:
x_slice = slice(ndvi_unweighted.x[0], ndvi_unweighted.x[-1])
y_slice = slice(ndvi_unweighted.y[0], ndvi_unweighted.y[-1])

Going larger¶

Let's change the area extent to about 4 degrees square.

In [11]:
# Compute the bounding box for the study area
buffer = 2
# Compute the bounding box for the study area
study_area_lat = (central_lat - buffer, central_lat + buffer)
study_area_lon = (central_lon - buffer, central_lon + buffer)
In [12]:
dataset = None # clear results from any previous runs
dataset = dc.load(
            product=products,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = {"time":2, "x":3072, "y":3072},
            group_by=group_by,
        )

ndvi_unweighted = masked_seasonal_ndvi(dataset)

Before we compute anything let's take a look at our result's shape and size

In [13]:
print(f"dataset size (GiB) {dataset.nbytes / 2**30:.2f}")
print(f"ndvi_unweighted size (GiB) {ndvi_unweighted.nbytes / 2**30:.2f}")
dataset size (GiB) 90.18
ndvi_unweighted size (GiB) 6.56

An order of magnitude larger for the results.

The result is getting on the large size for the notebook node so we will need to pay attention to data locality and the size of results being processed. The cluster has a LOT more memory than the notebook node; bring too much back to the notebook and the notebook will crash.

Tip: Be mindful of the size of the results and their data locality.

Now let's check the shape, tasks and chunks

In [14]:
dataset
Out[14]:
<xarray.Dataset>
Dimensions:      (time: 88, y: 16097, x: 13672)
Coordinates:
  * time         (time) datetime64[ns] 2020-02-01T23:50:22.661832 ... 2021-01...
  * y            (y) float64 -3.777e+06 -3.778e+06 ... -4.26e+06 -4.26e+06
  * x            (x) float64 1.151e+06 1.152e+06 ... 1.562e+06 1.562e+06
    spatial_ref  int32 3577
Data variables:
    oa_fmask     (time, y, x) uint8 dask.array<chunksize=(2, 3072, 3072), meta=np.ndarray>
    nbart_red    (time, y, x) int16 dask.array<chunksize=(2, 3072, 3072), meta=np.ndarray>
    nbart_nir    (time, y, x) int16 dask.array<chunksize=(2, 3072, 3072), meta=np.ndarray>
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref
xarray.Dataset
    • time: 88
    • y: 16097
    • x: 13672
    • time
      (time)
      datetime64[ns]
      2020-02-01T23:50:22.661832 ... 2...
      units :
      seconds since 1970-01-01 00:00:00
      array(['2020-02-01T23:50:22.661832000', '2020-02-07T00:08:29.746565000',
             '2020-02-08T23:56:07.838305000', '2020-02-16T00:02:17.290382000',
             '2020-02-17T23:50:19.161045000', '2020-02-23T00:08:26.324736000',
             '2020-02-24T23:56:04.080505000', '2020-03-03T00:02:12.406717000',
             '2020-03-04T23:50:13.948244000', '2020-03-10T00:08:20.316390000',
             '2020-03-11T23:55:57.760179000', '2020-03-19T00:02:05.042245000',
             '2020-03-20T23:50:06.289323000', '2020-03-26T00:08:11.875805000',
             '2020-03-27T23:55:49.016052000', '2020-04-04T00:01:55.630986000',
             '2020-04-05T23:49:57.314469000', '2020-04-11T00:08:04.012731000',
             '2020-04-12T23:55:41.590227000', '2020-04-20T00:01:49.320007000',
             '2020-04-21T23:49:50.704330000', '2020-04-27T00:07:56.581500000',
             '2020-04-28T23:55:33.837052000', '2020-05-06T00:01:40.434291000',
             '2020-05-07T23:49:41.480071000', '2020-05-13T00:07:50.902838000',
             '2020-05-14T23:55:29.766308000', '2020-05-22T00:01:42.100369000',
             '2020-05-23T23:49:44.775546000', '2020-05-29T00:07:54.100662000',
             '2020-05-30T23:55:33.883807000', '2020-06-07T00:01:49.406114000',
             '2020-06-08T23:49:52.971604000', '2020-06-14T00:08:04.440571000',
             '2020-06-15T23:55:43.948339000', '2020-06-23T00:01:58.499654000',
             '2020-06-24T23:50:01.816337000', '2020-06-30T00:08:12.644411000',
             '2020-07-01T23:55:51.903501000', '2020-07-09T00:02:05.604453000',
             '2020-07-10T23:50:08.654008000', '2020-07-16T00:08:18.906561000',
             '2020-07-17T23:55:57.915501000', '2020-07-25T00:02:10.749330000',
             '2020-07-26T23:50:13.576390000', '2020-08-01T00:08:23.206243000',
             '2020-08-02T23:56:01.995060000', '2020-08-10T00:02:13.898814000',
             '2020-08-11T23:50:16.587883000', '2020-08-17T00:08:28.042251000',
             '2020-08-18T23:56:07.546299000', '2020-08-26T00:02:22.004142000',
             '2020-08-27T23:50:25.301919000', '2020-09-02T00:08:36.120079000',
             '2020-09-03T23:56:15.345595000', '2020-09-11T00:02:28.843787000',
             '2020-09-12T23:50:31.863003000', '2020-09-18T00:08:42.001241000',
             '2020-09-19T23:56:20.951110000', '2020-09-27T00:02:33.511001000',
             '2020-09-28T23:50:36.264585000', '2020-10-04T00:08:45.592469000',
             '2020-10-05T23:56:24.218742000', '2020-10-13T00:02:35.593089000',
             '2020-10-14T23:50:38.005262000', '2020-10-20T00:08:46.517652000',
             '2020-10-21T23:56:24.815860000', '2020-10-29T00:02:34.998859000',
             '2020-10-30T23:50:37.039902000', '2020-11-15T23:50:35.182960000',
             '2020-11-21T00:08:45.275374000', '2020-11-22T23:56:24.187266000',
             '2020-11-30T00:02:36.348713000', '2020-12-01T23:50:38.899628000',
             '2020-12-07T00:08:47.623149000', '2020-12-08T23:56:25.962523000',
             '2020-12-16T00:02:36.231037000', '2020-12-17T23:50:38.275891000',
             '2020-12-23T00:08:45.809734000', '2020-12-24T23:56:23.693900000',
             '2021-01-01T00:02:32.372067000', '2021-01-02T23:50:33.953006000',
             '2021-01-08T00:08:40.343533000', '2021-01-09T23:56:17.797994000',
             '2021-01-17T00:02:25.007159000', '2021-01-18T23:50:26.371653000',
             '2021-01-24T00:08:34.838482000', '2021-01-25T23:56:13.122909000'],
            dtype='datetime64[ns]')
    • y
      (y)
      float64
      -3.777e+06 -3.778e+06 ... -4.26e+06
      units :
      metre
      resolution :
      -30.0
      crs :
      EPSG:3577
      array([-3777495., -3777525., -3777555., ..., -4260315., -4260345., -4260375.])
    • x
      (x)
      float64
      1.151e+06 1.152e+06 ... 1.562e+06
      units :
      metre
      resolution :
      30.0
      crs :
      EPSG:3577
      array([1151475., 1151505., 1151535., ..., 1561545., 1561575., 1561605.])
    • spatial_ref
      ()
      int32
      3577
      spatial_ref :
      PROJCS["GDA94 / Australian Albers",GEOGCS["GDA94",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6283"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4283"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",132],PARAMETER["standard_parallel_1",-18],PARAMETER["standard_parallel_2",-36],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","3577"]]
      grid_mapping_name :
      albers_conical_equal_area
      array(3577, dtype=int32)
    • oa_fmask
      (time, y, x)
      uint8
      dask.array<chunksize=(2, 3072, 3072), meta=np.ndarray>
      units :
      1
      nodata :
      0
      flags_definition :
      {'fmask': {'bits': [0, 1, 2, 3, 4, 5, 6, 7], 'values': {'0': 'nodata', '1': 'valid', '2': 'cloud', '3': 'shadow', '4': 'snow', '5': 'water'}, 'description': 'Fmask'}}
      crs :
      EPSG:3577
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 18.04 GiB 18.00 MiB
      Shape (88, 16097, 13672) (2, 3072, 3072)
      Dask graph 1320 chunks in 2 graph layers
      Data type uint8 numpy.ndarray
      13672 16097 88
    • nbart_red
      (time, y, x)
      int16
      dask.array<chunksize=(2, 3072, 3072), meta=np.ndarray>
      units :
      1
      nodata :
      -999
      crs :
      EPSG:3577
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 36.07 GiB 36.00 MiB
      Shape (88, 16097, 13672) (2, 3072, 3072)
      Dask graph 1320 chunks in 2 graph layers
      Data type int16 numpy.ndarray
      13672 16097 88
    • nbart_nir
      (time, y, x)
      int16
      dask.array<chunksize=(2, 3072, 3072), meta=np.ndarray>
      units :
      1
      nodata :
      -999
      crs :
      EPSG:3577
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 36.07 GiB 36.00 MiB
      Shape (88, 16097, 13672) (2, 3072, 3072)
      Dask graph 1320 chunks in 2 graph layers
      Data type int16 numpy.ndarray
      13672 16097 88
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2020-02-01 23:50:22.661832', '2020-02-07 00:08:29.746565',
                     '2020-02-08 23:56:07.838305', '2020-02-16 00:02:17.290382',
                     '2020-02-17 23:50:19.161045', '2020-02-23 00:08:26.324736',
                     '2020-02-24 23:56:04.080505', '2020-03-03 00:02:12.406717',
                     '2020-03-04 23:50:13.948244', '2020-03-10 00:08:20.316390',
                     '2020-03-11 23:55:57.760179', '2020-03-19 00:02:05.042245',
                     '2020-03-20 23:50:06.289323', '2020-03-26 00:08:11.875805',
                     '2020-03-27 23:55:49.016052', '2020-04-04 00:01:55.630986',
                     '2020-04-05 23:49:57.314469', '2020-04-11 00:08:04.012731',
                     '2020-04-12 23:55:41.590227', '2020-04-20 00:01:49.320007',
                     '2020-04-21 23:49:50.704330', '2020-04-27 00:07:56.581500',
                     '2020-04-28 23:55:33.837052', '2020-05-06 00:01:40.434291',
                     '2020-05-07 23:49:41.480071', '2020-05-13 00:07:50.902838',
                     '2020-05-14 23:55:29.766308', '2020-05-22 00:01:42.100369',
                     '2020-05-23 23:49:44.775546', '2020-05-29 00:07:54.100662',
                     '2020-05-30 23:55:33.883807', '2020-06-07 00:01:49.406114',
                     '2020-06-08 23:49:52.971604', '2020-06-14 00:08:04.440571',
                     '2020-06-15 23:55:43.948339', '2020-06-23 00:01:58.499654',
                     '2020-06-24 23:50:01.816337', '2020-06-30 00:08:12.644411',
                     '2020-07-01 23:55:51.903501', '2020-07-09 00:02:05.604453',
                     '2020-07-10 23:50:08.654008', '2020-07-16 00:08:18.906561',
                     '2020-07-17 23:55:57.915501', '2020-07-25 00:02:10.749330',
                     '2020-07-26 23:50:13.576390', '2020-08-01 00:08:23.206243',
                     '2020-08-02 23:56:01.995060', '2020-08-10 00:02:13.898814',
                     '2020-08-11 23:50:16.587883', '2020-08-17 00:08:28.042251',
                     '2020-08-18 23:56:07.546299', '2020-08-26 00:02:22.004142',
                     '2020-08-27 23:50:25.301919', '2020-09-02 00:08:36.120079',
                     '2020-09-03 23:56:15.345595', '2020-09-11 00:02:28.843787',
                     '2020-09-12 23:50:31.863003', '2020-09-18 00:08:42.001241',
                     '2020-09-19 23:56:20.951110', '2020-09-27 00:02:33.511001',
                     '2020-09-28 23:50:36.264585', '2020-10-04 00:08:45.592469',
                     '2020-10-05 23:56:24.218742', '2020-10-13 00:02:35.593089',
                     '2020-10-14 23:50:38.005262', '2020-10-20 00:08:46.517652',
                     '2020-10-21 23:56:24.815860', '2020-10-29 00:02:34.998859',
                     '2020-10-30 23:50:37.039902', '2020-11-15 23:50:35.182960',
                     '2020-11-21 00:08:45.275374', '2020-11-22 23:56:24.187266',
                     '2020-11-30 00:02:36.348713', '2020-12-01 23:50:38.899628',
                     '2020-12-07 00:08:47.623149', '2020-12-08 23:56:25.962523',
                     '2020-12-16 00:02:36.231037', '2020-12-17 23:50:38.275891',
                     '2020-12-23 00:08:45.809734', '2020-12-24 23:56:23.693900',
                     '2021-01-01 00:02:32.372067', '2021-01-02 23:50:33.953006',
                     '2021-01-08 00:08:40.343533', '2021-01-09 23:56:17.797994',
                     '2021-01-17 00:02:25.007159', '2021-01-18 23:50:26.371653',
                     '2021-01-24 00:08:34.838482', '2021-01-25 23:56:13.122909'],
                    dtype='datetime64[ns]', name='time', freq=None))
    • y
      PandasIndex
      PandasIndex(Float64Index([-3777495.0, -3777525.0, -3777555.0, -3777585.0, -3777615.0,
                    -3777645.0, -3777675.0, -3777705.0, -3777735.0, -3777765.0,
                    ...
                    -4260105.0, -4260135.0, -4260165.0, -4260195.0, -4260225.0,
                    -4260255.0, -4260285.0, -4260315.0, -4260345.0, -4260375.0],
                   dtype='float64', name='y', length=16097))
    • x
      PandasIndex
      PandasIndex(Float64Index([1151475.0, 1151505.0, 1151535.0, 1151565.0, 1151595.0, 1151625.0,
                    1151655.0, 1151685.0, 1151715.0, 1151745.0,
                    ...
                    1561335.0, 1561365.0, 1561395.0, 1561425.0, 1561455.0, 1561485.0,
                    1561515.0, 1561545.0, 1561575.0, 1561605.0],
                   dtype='float64', name='x', length=13672))
  • crs :
    EPSG:3577
    grid_mapping :
    spatial_ref

Looking at the red data variable we can see about 50 GiB for the array, 36 MiB per chunk and 5526 tasks. Noting the nir and qa_band will be similarly shaped and size.

The number of tasks is climbing so we can expect an increase in task graph optimisation time.

Chunk size and tasks seems okay, but we will monitor the dask dashboard in case there are issues with temporaries causing workers to spill to disk if memory is too full.

The chunking is resulting in some slivers, particularly on the y axis. Let's modify the y chunk size so these slivers don't exist as its blowing out the tasks and is likely unnecessary. Five chunks vertically is a close fit so let's expand the chunk size and see what happens. We will need to check the chunk size afterwards to make sure it doesn't get too large. If it does we can make the chunks smaller to reduce slivers too.

In [15]:
from math import ceil

y_chunks = ceil(dataset.dims['y']/5)
y_chunks
Out[15]:
3220
In [16]:
dataset = None # clear results from any previous runs
dataset = dc.load(
            product=products,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = {"time":2, "x":3072, "y":y_chunks},
            group_by=group_by,
        )

ndvi_unweighted = masked_seasonal_ndvi(dataset)

Now recheck our chunk size and tasks for red

In [17]:
dataset
Out[17]:
<xarray.Dataset>
Dimensions:      (time: 88, y: 16097, x: 13672)
Coordinates:
  * time         (time) datetime64[ns] 2020-02-01T23:50:22.661832 ... 2021-01...
  * y            (y) float64 -3.777e+06 -3.778e+06 ... -4.26e+06 -4.26e+06
  * x            (x) float64 1.151e+06 1.152e+06 ... 1.562e+06 1.562e+06
    spatial_ref  int32 3577
Data variables:
    oa_fmask     (time, y, x) uint8 dask.array<chunksize=(2, 3220, 3072), meta=np.ndarray>
    nbart_red    (time, y, x) int16 dask.array<chunksize=(2, 3220, 3072), meta=np.ndarray>
    nbart_nir    (time, y, x) int16 dask.array<chunksize=(2, 3220, 3072), meta=np.ndarray>
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref
xarray.Dataset
    • time: 88
    • y: 16097
    • x: 13672
    • time
      (time)
      datetime64[ns]
      2020-02-01T23:50:22.661832 ... 2...
      units :
      seconds since 1970-01-01 00:00:00
      array(['2020-02-01T23:50:22.661832000', '2020-02-07T00:08:29.746565000',
             '2020-02-08T23:56:07.838305000', '2020-02-16T00:02:17.290382000',
             '2020-02-17T23:50:19.161045000', '2020-02-23T00:08:26.324736000',
             '2020-02-24T23:56:04.080505000', '2020-03-03T00:02:12.406717000',
             '2020-03-04T23:50:13.948244000', '2020-03-10T00:08:20.316390000',
             '2020-03-11T23:55:57.760179000', '2020-03-19T00:02:05.042245000',
             '2020-03-20T23:50:06.289323000', '2020-03-26T00:08:11.875805000',
             '2020-03-27T23:55:49.016052000', '2020-04-04T00:01:55.630986000',
             '2020-04-05T23:49:57.314469000', '2020-04-11T00:08:04.012731000',
             '2020-04-12T23:55:41.590227000', '2020-04-20T00:01:49.320007000',
             '2020-04-21T23:49:50.704330000', '2020-04-27T00:07:56.581500000',
             '2020-04-28T23:55:33.837052000', '2020-05-06T00:01:40.434291000',
             '2020-05-07T23:49:41.480071000', '2020-05-13T00:07:50.902838000',
             '2020-05-14T23:55:29.766308000', '2020-05-22T00:01:42.100369000',
             '2020-05-23T23:49:44.775546000', '2020-05-29T00:07:54.100662000',
             '2020-05-30T23:55:33.883807000', '2020-06-07T00:01:49.406114000',
             '2020-06-08T23:49:52.971604000', '2020-06-14T00:08:04.440571000',
             '2020-06-15T23:55:43.948339000', '2020-06-23T00:01:58.499654000',
             '2020-06-24T23:50:01.816337000', '2020-06-30T00:08:12.644411000',
             '2020-07-01T23:55:51.903501000', '2020-07-09T00:02:05.604453000',
             '2020-07-10T23:50:08.654008000', '2020-07-16T00:08:18.906561000',
             '2020-07-17T23:55:57.915501000', '2020-07-25T00:02:10.749330000',
             '2020-07-26T23:50:13.576390000', '2020-08-01T00:08:23.206243000',
             '2020-08-02T23:56:01.995060000', '2020-08-10T00:02:13.898814000',
             '2020-08-11T23:50:16.587883000', '2020-08-17T00:08:28.042251000',
             '2020-08-18T23:56:07.546299000', '2020-08-26T00:02:22.004142000',
             '2020-08-27T23:50:25.301919000', '2020-09-02T00:08:36.120079000',
             '2020-09-03T23:56:15.345595000', '2020-09-11T00:02:28.843787000',
             '2020-09-12T23:50:31.863003000', '2020-09-18T00:08:42.001241000',
             '2020-09-19T23:56:20.951110000', '2020-09-27T00:02:33.511001000',
             '2020-09-28T23:50:36.264585000', '2020-10-04T00:08:45.592469000',
             '2020-10-05T23:56:24.218742000', '2020-10-13T00:02:35.593089000',
             '2020-10-14T23:50:38.005262000', '2020-10-20T00:08:46.517652000',
             '2020-10-21T23:56:24.815860000', '2020-10-29T00:02:34.998859000',
             '2020-10-30T23:50:37.039902000', '2020-11-15T23:50:35.182960000',
             '2020-11-21T00:08:45.275374000', '2020-11-22T23:56:24.187266000',
             '2020-11-30T00:02:36.348713000', '2020-12-01T23:50:38.899628000',
             '2020-12-07T00:08:47.623149000', '2020-12-08T23:56:25.962523000',
             '2020-12-16T00:02:36.231037000', '2020-12-17T23:50:38.275891000',
             '2020-12-23T00:08:45.809734000', '2020-12-24T23:56:23.693900000',
             '2021-01-01T00:02:32.372067000', '2021-01-02T23:50:33.953006000',
             '2021-01-08T00:08:40.343533000', '2021-01-09T23:56:17.797994000',
             '2021-01-17T00:02:25.007159000', '2021-01-18T23:50:26.371653000',
             '2021-01-24T00:08:34.838482000', '2021-01-25T23:56:13.122909000'],
            dtype='datetime64[ns]')
    • y
      (y)
      float64
      -3.777e+06 -3.778e+06 ... -4.26e+06
      units :
      metre
      resolution :
      -30.0
      crs :
      EPSG:3577
      array([-3777495., -3777525., -3777555., ..., -4260315., -4260345., -4260375.])
    • x
      (x)
      float64
      1.151e+06 1.152e+06 ... 1.562e+06
      units :
      metre
      resolution :
      30.0
      crs :
      EPSG:3577
      array([1151475., 1151505., 1151535., ..., 1561545., 1561575., 1561605.])
    • spatial_ref
      ()
      int32
      3577
      spatial_ref :
      PROJCS["GDA94 / Australian Albers",GEOGCS["GDA94",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6283"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4283"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",132],PARAMETER["standard_parallel_1",-18],PARAMETER["standard_parallel_2",-36],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","3577"]]
      grid_mapping_name :
      albers_conical_equal_area
      array(3577, dtype=int32)
    • oa_fmask
      (time, y, x)
      uint8
      dask.array<chunksize=(2, 3220, 3072), meta=np.ndarray>
      units :
      1
      nodata :
      0
      flags_definition :
      {'fmask': {'bits': [0, 1, 2, 3, 4, 5, 6, 7], 'values': {'0': 'nodata', '1': 'valid', '2': 'cloud', '3': 'shadow', '4': 'snow', '5': 'water'}, 'description': 'Fmask'}}
      crs :
      EPSG:3577
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 18.04 GiB 18.87 MiB
      Shape (88, 16097, 13672) (2, 3220, 3072)
      Dask graph 1100 chunks in 2 graph layers
      Data type uint8 numpy.ndarray
      13672 16097 88
    • nbart_red
      (time, y, x)
      int16
      dask.array<chunksize=(2, 3220, 3072), meta=np.ndarray>
      units :
      1
      nodata :
      -999
      crs :
      EPSG:3577
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 36.07 GiB 37.73 MiB
      Shape (88, 16097, 13672) (2, 3220, 3072)
      Dask graph 1100 chunks in 2 graph layers
      Data type int16 numpy.ndarray
      13672 16097 88
    • nbart_nir
      (time, y, x)
      int16
      dask.array<chunksize=(2, 3220, 3072), meta=np.ndarray>
      units :
      1
      nodata :
      -999
      crs :
      EPSG:3577
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 36.07 GiB 37.73 MiB
      Shape (88, 16097, 13672) (2, 3220, 3072)
      Dask graph 1100 chunks in 2 graph layers
      Data type int16 numpy.ndarray
      13672 16097 88
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2020-02-01 23:50:22.661832', '2020-02-07 00:08:29.746565',
                     '2020-02-08 23:56:07.838305', '2020-02-16 00:02:17.290382',
                     '2020-02-17 23:50:19.161045', '2020-02-23 00:08:26.324736',
                     '2020-02-24 23:56:04.080505', '2020-03-03 00:02:12.406717',
                     '2020-03-04 23:50:13.948244', '2020-03-10 00:08:20.316390',
                     '2020-03-11 23:55:57.760179', '2020-03-19 00:02:05.042245',
                     '2020-03-20 23:50:06.289323', '2020-03-26 00:08:11.875805',
                     '2020-03-27 23:55:49.016052', '2020-04-04 00:01:55.630986',
                     '2020-04-05 23:49:57.314469', '2020-04-11 00:08:04.012731',
                     '2020-04-12 23:55:41.590227', '2020-04-20 00:01:49.320007',
                     '2020-04-21 23:49:50.704330', '2020-04-27 00:07:56.581500',
                     '2020-04-28 23:55:33.837052', '2020-05-06 00:01:40.434291',
                     '2020-05-07 23:49:41.480071', '2020-05-13 00:07:50.902838',
                     '2020-05-14 23:55:29.766308', '2020-05-22 00:01:42.100369',
                     '2020-05-23 23:49:44.775546', '2020-05-29 00:07:54.100662',
                     '2020-05-30 23:55:33.883807', '2020-06-07 00:01:49.406114',
                     '2020-06-08 23:49:52.971604', '2020-06-14 00:08:04.440571',
                     '2020-06-15 23:55:43.948339', '2020-06-23 00:01:58.499654',
                     '2020-06-24 23:50:01.816337', '2020-06-30 00:08:12.644411',
                     '2020-07-01 23:55:51.903501', '2020-07-09 00:02:05.604453',
                     '2020-07-10 23:50:08.654008', '2020-07-16 00:08:18.906561',
                     '2020-07-17 23:55:57.915501', '2020-07-25 00:02:10.749330',
                     '2020-07-26 23:50:13.576390', '2020-08-01 00:08:23.206243',
                     '2020-08-02 23:56:01.995060', '2020-08-10 00:02:13.898814',
                     '2020-08-11 23:50:16.587883', '2020-08-17 00:08:28.042251',
                     '2020-08-18 23:56:07.546299', '2020-08-26 00:02:22.004142',
                     '2020-08-27 23:50:25.301919', '2020-09-02 00:08:36.120079',
                     '2020-09-03 23:56:15.345595', '2020-09-11 00:02:28.843787',
                     '2020-09-12 23:50:31.863003', '2020-09-18 00:08:42.001241',
                     '2020-09-19 23:56:20.951110', '2020-09-27 00:02:33.511001',
                     '2020-09-28 23:50:36.264585', '2020-10-04 00:08:45.592469',
                     '2020-10-05 23:56:24.218742', '2020-10-13 00:02:35.593089',
                     '2020-10-14 23:50:38.005262', '2020-10-20 00:08:46.517652',
                     '2020-10-21 23:56:24.815860', '2020-10-29 00:02:34.998859',
                     '2020-10-30 23:50:37.039902', '2020-11-15 23:50:35.182960',
                     '2020-11-21 00:08:45.275374', '2020-11-22 23:56:24.187266',
                     '2020-11-30 00:02:36.348713', '2020-12-01 23:50:38.899628',
                     '2020-12-07 00:08:47.623149', '2020-12-08 23:56:25.962523',
                     '2020-12-16 00:02:36.231037', '2020-12-17 23:50:38.275891',
                     '2020-12-23 00:08:45.809734', '2020-12-24 23:56:23.693900',
                     '2021-01-01 00:02:32.372067', '2021-01-02 23:50:33.953006',
                     '2021-01-08 00:08:40.343533', '2021-01-09 23:56:17.797994',
                     '2021-01-17 00:02:25.007159', '2021-01-18 23:50:26.371653',
                     '2021-01-24 00:08:34.838482', '2021-01-25 23:56:13.122909'],
                    dtype='datetime64[ns]', name='time', freq=None))
    • y
      PandasIndex
      PandasIndex(Float64Index([-3777495.0, -3777525.0, -3777555.0, -3777585.0, -3777615.0,
                    -3777645.0, -3777675.0, -3777705.0, -3777735.0, -3777765.0,
                    ...
                    -4260105.0, -4260135.0, -4260165.0, -4260195.0, -4260225.0,
                    -4260255.0, -4260285.0, -4260315.0, -4260345.0, -4260375.0],
                   dtype='float64', name='y', length=16097))
    • x
      PandasIndex
      PandasIndex(Float64Index([1151475.0, 1151505.0, 1151535.0, 1151565.0, 1151595.0, 1151625.0,
                    1151655.0, 1151685.0, 1151715.0, 1151745.0,
                    ...
                    1561335.0, 1561365.0, 1561395.0, 1561425.0, 1561455.0, 1561485.0,
                    1561515.0, 1561545.0, 1561575.0, 1561605.0],
                   dtype='float64', name='x', length=13672))
  • crs :
    EPSG:3577
    grid_mapping :
    spatial_ref

Very marginal increase in memory per chunk but the tasks have dropped from 5526 to 4661. Note that this occurs for every measurement and operation so the benefit is significant.

In [18]:
ndvi_unweighted
Out[18]:
<xarray.DataArray (season: 4, y: 16097, x: 13672)>
dask.array<stack, shape=(4, 16097, 13672), dtype=float64, chunksize=(1, 3220, 3072), chunktype=numpy.ndarray>
Coordinates:
  * y            (y) float64 -3.777e+06 -3.778e+06 ... -4.26e+06 -4.26e+06
  * x            (x) float64 1.151e+06 1.152e+06 ... 1.562e+06 1.562e+06
    spatial_ref  int32 3577
  * season       (season) object 'DJF' 'JJA' 'MAM' 'SON'
xarray.DataArray
  • season: 4
  • y: 16097
  • x: 13672
  • dask.array<chunksize=(1, 3220, 3072), meta=np.ndarray>
    Array Chunk
    Bytes 6.56 GiB 75.47 MiB
    Shape (4, 16097, 13672) (1, 3220, 3072)
    Dask graph 100 chunks in 32 graph layers
    Data type float64 numpy.ndarray
    13672 16097 4
    • y
      (y)
      float64
      -3.777e+06 -3.778e+06 ... -4.26e+06
      units :
      metre
      resolution :
      -30.0
      crs :
      EPSG:3577
      array([-3777495., -3777525., -3777555., ..., -4260315., -4260345., -4260375.])
    • x
      (x)
      float64
      1.151e+06 1.152e+06 ... 1.562e+06
      units :
      metre
      resolution :
      30.0
      crs :
      EPSG:3577
      array([1151475., 1151505., 1151535., ..., 1561545., 1561575., 1561605.])
    • spatial_ref
      ()
      int32
      3577
      spatial_ref :
      PROJCS["GDA94 / Australian Albers",GEOGCS["GDA94",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6283"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4283"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",132],PARAMETER["standard_parallel_1",-18],PARAMETER["standard_parallel_2",-36],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","3577"]]
      grid_mapping_name :
      albers_conical_equal_area
      array(3577, dtype=int32)
    • season
      (season)
      object
      'DJF' 'JJA' 'MAM' 'SON'
      array(['DJF', 'JJA', 'MAM', 'SON'], dtype=object)
    • y
      PandasIndex
      PandasIndex(Float64Index([-3777495.0, -3777525.0, -3777555.0, -3777585.0, -3777615.0,
                    -3777645.0, -3777675.0, -3777705.0, -3777735.0, -3777765.0,
                    ...
                    -4260105.0, -4260135.0, -4260165.0, -4260195.0, -4260225.0,
                    -4260255.0, -4260285.0, -4260315.0, -4260345.0, -4260375.0],
                   dtype='float64', name='y', length=16097))
    • x
      PandasIndex
      PandasIndex(Float64Index([1151475.0, 1151505.0, 1151535.0, 1151565.0, 1151595.0, 1151625.0,
                    1151655.0, 1151685.0, 1151715.0, 1151745.0,
                    ...
                    1561335.0, 1561365.0, 1561395.0, 1561425.0, 1561455.0, 1561485.0,
                    1561515.0, 1561545.0, 1561575.0, 1561605.0],
                   dtype='float64', name='x', length=13672))
    • season
      PandasIndex
      PandasIndex(Index(['DJF', 'JJA', 'MAM', 'SON'], dtype='object', name='season'))

Total task count is sub 100_000 so should be okay but task graph optimisation will take a while. Resulting array is a bit big for the notebook node as stated previously.

The shape spatially is y:15686, x:13707. Standard plots aren't going to work very well for visualising the result in the notebook and the result uses a fair amount of memory so we'll need a different approach.

For now let's visualize the same ROI as the small area before. We stashed that ROI in x_slice, y_slice.

If you haven't already, open the dask dashboard so you can watch the cluster make progress

The code to do this visualisation is basically the same as before except now we specify a slice

In [19]:
%%time
ndvi_unweighted.sel(season='DJF', x=x_slice, y=y_slice).compute().plot()
CPU times: user 3.93 s, sys: 532 ms, total: 4.47 s
Wall time: 34.7 s
Out[19]:
<matplotlib.collections.QuadMesh at 0x7f48445c29e0>

The computation time is relatively short since we are only materialising the result for a subset of the overall dataset.

Visualising all of the data¶

To visualise all of the data we will make use of the dask cluster and some dask-aware visualisation capabilties from holoviews and datashader python libraries. These libraries provide an interactive visualisation capability that leaves the large datasets on the cluster and transmits only the final visualisation to the Jupyter notebook. This is done on the fly so the user can zoom and pan about the dataset in all dimensions and the dask cluster will scale data to fit in the viewport automatically. Details of how this is done and advanced features available is beyond the scope of this dask and ODC course but the manuals are extensive and the basic example here both powerful and useful.

Tip: The datashader pipeline page provides an excellent summary of what's going on.

compute() and persist()¶

The first thing we will do is persist() the results of our calculation to the cluster. This will materialise the results but will keep the result on the cluster (so all lazy tasks are calculated, just like compute() but data locality remains on the cluster). This will ensure the result is readily available for the visualisation. The cluster has plenty of (distributed) memory so there is no reason not to materialise the result on the cluster.

persist() is non-blocking so will return just as soon as task graph optimisation (which is performed in the notebook kernel) is complete. Run the next cell and you will see it takes a few seconds to do task graph optimisation, and once that is complete the Jupyter notebook will be available for use again. At the same time the dask dashboard will show tasks running as the result is computed and left on the cluster.

In [20]:
%%time
on_cluster_result = ndvi_unweighted.persist()
CPU times: user 1.19 s, sys: 23.8 ms, total: 1.22 s
Wall time: 1.2 s

The on_cluster_result will continue to show as a dask array on the cluster - not actual results. Think of it as a handle that links the Jupyter client to the result on the dask cluster.

In [21]:
on_cluster_result
Out[21]:
<xarray.DataArray (season: 4, y: 16097, x: 13672)>
dask.array<stack, shape=(4, 16097, 13672), dtype=float64, chunksize=(1, 3220, 3072), chunktype=numpy.ndarray>
Coordinates:
  * y            (y) float64 -3.777e+06 -3.778e+06 ... -4.26e+06 -4.26e+06
  * x            (x) float64 1.151e+06 1.152e+06 ... 1.562e+06 1.562e+06
    spatial_ref  int32 3577
  * season       (season) object 'DJF' 'JJA' 'MAM' 'SON'
xarray.DataArray
  • season: 4
  • y: 16097
  • x: 13672
  • dask.array<chunksize=(1, 3220, 3072), meta=np.ndarray>
    Array Chunk
    Bytes 6.56 GiB 75.47 MiB
    Shape (4, 16097, 13672) (1, 3220, 3072)
    Dask graph 100 chunks in 1 graph layer
    Data type float64 numpy.ndarray
    13672 16097 4
    • y
      (y)
      float64
      -3.777e+06 -3.778e+06 ... -4.26e+06
      units :
      metre
      resolution :
      -30.0
      crs :
      EPSG:3577
      array([-3777495., -3777525., -3777555., ..., -4260315., -4260345., -4260375.])
    • x
      (x)
      float64
      1.151e+06 1.152e+06 ... 1.562e+06
      units :
      metre
      resolution :
      30.0
      crs :
      EPSG:3577
      array([1151475., 1151505., 1151535., ..., 1561545., 1561575., 1561605.])
    • spatial_ref
      ()
      int32
      3577
      spatial_ref :
      PROJCS["GDA94 / Australian Albers",GEOGCS["GDA94",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6283"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4283"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",132],PARAMETER["standard_parallel_1",-18],PARAMETER["standard_parallel_2",-36],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","3577"]]
      grid_mapping_name :
      albers_conical_equal_area
      array(3577, dtype=int32)
    • season
      (season)
      object
      'DJF' 'JJA' 'MAM' 'SON'
      array(['DJF', 'JJA', 'MAM', 'SON'], dtype=object)
    • y
      PandasIndex
      PandasIndex(Float64Index([-3777495.0, -3777525.0, -3777555.0, -3777585.0, -3777615.0,
                    -3777645.0, -3777675.0, -3777705.0, -3777735.0, -3777765.0,
                    ...
                    -4260105.0, -4260135.0, -4260165.0, -4260195.0, -4260225.0,
                    -4260255.0, -4260285.0, -4260315.0, -4260345.0, -4260375.0],
                   dtype='float64', name='y', length=16097))
    • x
      PandasIndex
      PandasIndex(Float64Index([1151475.0, 1151505.0, 1151535.0, 1151565.0, 1151595.0, 1151625.0,
                    1151655.0, 1151685.0, 1151715.0, 1151745.0,
                    ...
                    1561335.0, 1561365.0, 1561395.0, 1561425.0, 1561455.0, 1561485.0,
                    1561515.0, 1561545.0, 1561575.0, 1561605.0],
                   dtype='float64', name='x', length=13672))
    • season
      PandasIndex
      PandasIndex(Index(['DJF', 'JJA', 'MAM', 'SON'], dtype='object', name='season'))

The cluster will go do the computation, we can continue here though. Let's import holoviews and datashader. We'll be using datashader.rasterize to handle the scaling of the full dataset which is 19323, 14172 in size into the 800 or so pixel dimension being displayed in the notebook. Notice also there are no bounds set on the dataset, we are viewing the entire result, including the season dimension. holoviews will provide an interface for pan, zoom and time (season) selection and you can use the mouse to move around the data.

In [22]:
import holoviews as hv
import xarray as xr
from holoviews import opts
from holoviews.operation.datashader import rasterize

hv.extension("bokeh", width=800)

holoviews expects the xarray.DataArray to have a name so let's give it one via the name attribute

In [23]:
on_cluster_result.name = "ndvi"
on_cluster_result
Out[23]:
<xarray.DataArray 'ndvi' (season: 4, y: 16097, x: 13672)>
dask.array<stack, shape=(4, 16097, 13672), dtype=float64, chunksize=(1, 3220, 3072), chunktype=numpy.ndarray>
Coordinates:
  * y            (y) float64 -3.777e+06 -3.778e+06 ... -4.26e+06 -4.26e+06
  * x            (x) float64 1.151e+06 1.152e+06 ... 1.562e+06 1.562e+06
    spatial_ref  int32 3577
  * season       (season) object 'DJF' 'JJA' 'MAM' 'SON'
xarray.DataArray
'ndvi'
  • season: 4
  • y: 16097
  • x: 13672
  • dask.array<chunksize=(1, 3220, 3072), meta=np.ndarray>
    Array Chunk
    Bytes 6.56 GiB 75.47 MiB
    Shape (4, 16097, 13672) (1, 3220, 3072)
    Dask graph 100 chunks in 1 graph layer
    Data type float64 numpy.ndarray
    13672 16097 4
    • y
      (y)
      float64
      -3.777e+06 -3.778e+06 ... -4.26e+06
      units :
      metre
      resolution :
      -30.0
      crs :
      EPSG:3577
      array([-3777495., -3777525., -3777555., ..., -4260315., -4260345., -4260375.])
    • x
      (x)
      float64
      1.151e+06 1.152e+06 ... 1.562e+06
      units :
      metre
      resolution :
      30.0
      crs :
      EPSG:3577
      array([1151475., 1151505., 1151535., ..., 1561545., 1561575., 1561605.])
    • spatial_ref
      ()
      int32
      3577
      spatial_ref :
      PROJCS["GDA94 / Australian Albers",GEOGCS["GDA94",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6283"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4283"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",132],PARAMETER["standard_parallel_1",-18],PARAMETER["standard_parallel_2",-36],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","3577"]]
      grid_mapping_name :
      albers_conical_equal_area
      array(3577, dtype=int32)
    • season
      (season)
      object
      'DJF' 'JJA' 'MAM' 'SON'
      array(['DJF', 'JJA', 'MAM', 'SON'], dtype=object)
    • y
      PandasIndex
      PandasIndex(Float64Index([-3777495.0, -3777525.0, -3777555.0, -3777585.0, -3777615.0,
                    -3777645.0, -3777675.0, -3777705.0, -3777735.0, -3777765.0,
                    ...
                    -4260105.0, -4260135.0, -4260165.0, -4260195.0, -4260225.0,
                    -4260255.0, -4260285.0, -4260315.0, -4260345.0, -4260375.0],
                   dtype='float64', name='y', length=16097))
    • x
      PandasIndex
      PandasIndex(Float64Index([1151475.0, 1151505.0, 1151535.0, 1151565.0, 1151595.0, 1151625.0,
                    1151655.0, 1151685.0, 1151715.0, 1151745.0,
                    ...
                    1561335.0, 1561365.0, 1561395.0, 1561425.0, 1561455.0, 1561485.0,
                    1561515.0, 1561545.0, 1561575.0, 1561605.0],
                   dtype='float64', name='x', length=13672))
    • season
      PandasIndex
      PandasIndex(Index(['DJF', 'JJA', 'MAM', 'SON'], dtype='object', name='season'))
In [24]:
aspect = on_cluster_result.sizes['y']/on_cluster_result.sizes['x']
width = 800
height = int(width*aspect)

ndvi_seasonal_mean_ds = on_cluster_result.to_dataset(
    name="ndvi_seasonal_mean"
)  # holoviews works better with datasets so let's convert the xarray DataArray holding ndvi into a Dataset

The next cell will display the result - when its ready. The rasterize function will calculate a representative pixel for display from the full array on the dask cluster. If you monitor the dashboard you will see small bursts of activity across the workers and quite some waiting whilst data transfers occur to bring all the summary information back and transmit it to the Jupyter notebook.

You can use the controls on the right to pan and zoom around the full image. If you zoom in, rasterize will take a moment to do a new summary for the current zoom level and show more or less detail. Similarly for panning.

In [25]:
hv_ds = hv.Dataset(ndvi_seasonal_mean_ds)

rasterize(
    hv_ds.to(hv.Image, ["x", "y"], "ndvi_seasonal_mean", ["season"]).opts(
        title="NDVI Seasonal Mean",
        cmap="RdYlGn", # NDVI more green the larger the value. 
        clim=(-0.5, 1.0), # we'll clamp the range for visualisation to enhance the visualisation
        colorbar=True,
        width=width,
        height=height
    )
, precompute=True)
Out[25]:

Be a good dask user - Clean up the cluster resources¶

Disconnecting your client is good practice, but the cluster will still be up so we need to shut it down as well

In [27]:
client.close()

cluster.shutdown()
In [ ]: